home *** CD-ROM | disk | FTP | other *** search
-
- >Here's the form for registering 'text/html' partly filled in, from RFC
- >1341.
-
- I strongly suggest we bring the definition of HTML into conformance
- with the SGML standard before we register it with the IANA.
-
- >Published specification:
- > "The HTTP Protocol as Implemented in W3", avaiable for
- > anonymous ftp from ftp://info.cern.ch/pub/doc/www/http.txt.
- > Describes the HTTP interactive access protocol and the tags used
- > in HTML documents.
-
- This is the HTTP document, not the HTML document:
-
- This document defines the Hypertext Transfer protocol (HTTP) as
- currently implemented by the WorldWideWeb initaitive software.
-
- The HTML document is: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
- an old version of which is contained in http.txt.
-
- In any case, both documents mention some relationship between HTML and
- SGML which is not formally defined:
-
- The hypertext mark-up language is an SGML format. This defines the
- basic syntax used. The particular language, the set of tags and the
- rules about their use, and their significance is not part of the
- SGML standard. There being no standard on this, we have adopted a
- set which seems sensible. We call them HTML -- hypertext markup
- language. HTML is not an alternative to SGML, it is a particular
- format within the SGML rules (an SGML "DTD").
-
- The standard is very clear on this kind of thing. [I just got myself a
- copy, so I can quote it:]
-
- 4.103 (document) type declaration: A markup declaration that
- contains the formal specification of a document type
- definition.
-
- 4.104 document type delcaration subset: The element, entity,
- and short reference sets occuring within the declaration
- subset of a document type declaration.
-
- 4.105 document (type) definition: Rules, determined by an
- application, that apply SGML to the markup of documents of a
- particular type. A document type definition includes a formal
- specification, expressed in a document type declaration, of
- the element types, element relationships, and attributes, and
- references that can be represented by markup. It thereby
- defines the vocabulary of the markup for which SGML defines
- the syntax.
-
- So it seems that the HTML DTD is missing the "formal specification."
- I have written a document type declaration subset that matches HTML as
- currently defined and implemented, with a few exceptions (most
- importantly, the PLAINTEXT tag). See
- http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd
-
- Most existing HTML documents need only small modifications to bring
- them into conformance (quote attribute values, add the <!DOCTYPE ...>
- prologue). And the existing WWW browser parses conforming documents
- just fine.
-
- Currently HTML documents are transmitted without the normal SGML framing
- tags, but if these are included parsers will ignore them.
-
- I don't know what "the normal SGML framing tags" are. An SGML document
- has three parts: the SGML declaration, the prologue, and the instance.
- It is common in SGML applications to use an implied SGML declaration
- and include the prologue by reference (kinda like an #include
- directive in C.) but without these "framing tags," it's just not an
- SGML document.
-
- Besides, it's very little work to add the line:
-
- <!DOCTYPE HTML SYSTEM>
-
- at the beginning of HTML documents.
-
- More non-conforming stuff in Markup.html:
-
- Plaintext
-
- This tag indicates that all following text is to be taken litterally, up to
- the end of the file. Plain text is designed to be represented in the same
- way as example XMP text, with fixed width character and significant line
- breaks. Format:
-
-
- <PLAINTEXT>
-
- This tag allows the rest of a file to be read efficiently without parsing.
- Its presence is an optimisation. There is no closing tag.
-
- This should be moved outside the definition of HTML. It should just be
- part of the HTTP protocol: if the server starts the response with
- <PLAINTEXT>, what you're getting is plain text, not SGML.
-
- Another problem:
-
- Example sections
-
- The text may contain any ISO Latin printable characters, including the
- tag opener, so long as it does not contain the closing tag in full.
-
- This doesn't fit in SGML. The ETAGO delimiter ("</") ends a CDATA
- section.
-
- A clarification:
-
- Paragraph
-
- This tag indicates a new paragraph. The exact representation of this
- (indentation, leading, etc) is not defined here, and may be a function of
- other tags, style sheets etc. The format is simply
-
-
- <P>
-
- (In SGML terms, paragraph elements are transmitted in minimised form).
-
- The implementation suggests that the <P> tag marks an empty element, a
- paragraph separator, rather than allowing minimization in the form of
- an omitted end tag, </P>.
-
-
-
- We could even go so far as to call WWW an SGML application:
-
- 4.279 SGML Application: Rules that apply SGML to a text
- processing application. An SGML application includes a formal
- specification of the markup constructs used in the
- application, expressed in SGML. It can also include a
- non-SGML definition of semantics, application conventions,
- and/or processing.
-
- Note 2 The formal specification of an SGML application
- constitutes the common portions of the documents processed by
- th application. These common protions are frequently made
- available as public text.
-
- In other words, ftp://info.cern.ch/pub/doc/the_www_book.txt would
- serve as the "non-SGML definition." [by the way: I could only find
- postscript and LaTeX versions of the book: no txt file.] The "common
- portion" is html.dtd (we could obtain a public text identifier for
- it...).
-
- If we want to do this (define an SGML application) section 15.5
- requires this statement to be plastered all over the place:
-
- An SGML Application Conforming to International Standard
- ISO 8879 -- Standard Generalized Markup Language
-
- If we're gonna use SGML, why not do it right?
-
- Dan
-
-
-
-
-